64 research outputs found

    Ordering the suggestions of a spellchecker without using context.

    Get PDF
    Having located a misspelling, a spellchecker generally offers some suggestions for the intended word. Even without using context, a spellchecker can draw on various types of information in ordering its suggestions. A series of experiments is described, beginning with a basic corrector that implements a well-known algorithm for reversing single simple errors, and making successive enhancements to take account of substring matches, pronunciation, known error patterns, syllable structure and word frequency. The improvement in the ordering produced by each enhancement is measured on a large corpus of misspellings. The final version is tested on other corpora against a widely used commercial spellchecker and a research prototype

    A large list of confusion sets for spellchecking assessed against a corpus of real-word errors

    Get PDF
    One of the methods that has been proposed for dealing with real-word errors (errors that occur when a correctly spelled word is substituted for the one intended) is the "confusion-set" approach - a confusion set being a small group of words that are likely to be confused with one another. Using a list of confusion sets drawn up in advance, a spellchecker, on finding one of these words in a text, can assess whether one of the other members of its set would be a better fit and, if it appears to be so, propose that word as a correction. Much of the research using this approach has suffered from two weaknesses. The first is the small number of confusion sets used. The second is that systems have largely been tested on artificial errors. In this paper we address these two weaknesses. We describe the creation of a realistically sized list of confusion sets, then the assembling of a corpus of real-word errors, and then we assess the potential of that list in relation to that corpus

    The adaptation of an English spellchecker for Japanese writers

    Get PDF
    It has been pointed out that the spelling errors made by second-language writers writing in English have features that are to some extent characteristic of their first language, and the suggestion has been made that a spellchecker could be adapted to take account of these features. In the work reported here, a corpus of spelling errors made by Japanese writers writing in English was compared with a corpus of errors made by native speakers. While the great majority of errors were common to the two corpora, some distinctively Japanese error patterns were evident against this common background, notably a difficulty in deciding between the letters b and v, and the letters l and r, and a tendency to add syllables. A spellchecker that had been developed for native speakers of English was adapted to cope with these errors. A brief account is given of the spellchecker’s mode of operation to indicate how it lent itself to modifications of this kind. The native-speaker spellchecker and the Japanese-adapted version were run over the error corpora and the results show that these adaptations produced a modest but worthwhile improvement to the spellchecker’s performance in correcting Japanese-made errors

    A social forecast revisited

    Get PDF
    In 1971, the authors produced a 30-year forecast of leisure in the UK. In 2001 they obtained survey data for comparison with the forecasts. The paper presents the original forecasts and describes the methods used to produce them, assesses their accuracy in the light of the survey data, and concludes with some reflections on the underlying forecasting methodology and on changes in leisure patterns

    BNC! Handle with care! Spelling and tagging errors in the BNC

    Get PDF
    "You loose your no-claims bonus," instead of "You lose your no-claims bonus," is an example of a real-word spelling error. One way to enable a spellchecker to detect such errors is to prime it with information about likely features of the context for "loose" (verb) as compared with "lose". To this end, we extracted all the examples of "loose" used as a verb from the BNC (World edition, text). There were, apparently, 159 occurrences of "loose" (VVB or VVI). However, on inspection, well over half of these were not verbs at all (tagging errors) and over half of the rest were misspellings of "lose". Only about 15% were actual occurrences of "loose" as a verb. This prompted us to undertake a small investigation into errors in the BNC. We report on some words that occur more often as misspellings than in their own right - only one of the 63 occurrences of "ail", for example, is correct (possibly OCR errors) - and some words that are always mistagged, such as "haulier" and "glazier" (never NN), and "hanker" and "loiter" (never VV). We note in particular that, if a rare word resembles a common word (in spelling), it is more likely to appear as a misspelling of the common word than as a correct spelling of the rare word. These cases require some modification of an earlier conclusion (Damerau and Mays, 1989) on misspellings of rare words. We conclude with a discussion of the desirability, or otherwise, of correcting errors in corpora such as the BNC. The results may be of interest to people who use the BNC as training data or for teaching

    English spelling and the computer

    Get PDF
    The first half of the book is about spelling, the second about computers. Chapter Two describes how English spelling came to be in the state that it’s in today. In Chapter Three I summarize the debate between those who propose radical change to the system and those who favour keeping it as it is, and I show how computerized correction can be seen as providing at least some of the benefits that have been claimed for spelling reform. Too much of the literature on computerized spellcheckers describes tests based on collections of artificially created errors; Chapter Four looks at the sorts of misspellings that people actually make, to see more clearly the problems that a spellchecker has to face. Chapter Five looks more closely at the errors that people make when they don’t know how to spell a word, and Chapter Six at the errors that people make when they know perfectly well how to spell a word but for some reason write or type something else. Chapter Seven begins the second part of the book with a description of the methods that have been devised over the last thirty years for getting computers to detect and correct spelling errors. Its conclusion is that spellcheckers have some way to go before they can do the job we would like them to do. Chapters Eight to Ten describe a spellchecker that I have designed which attempts to address some of the remaining problems, especially those presented by badly spelt text. In 1982, when I began this research, there were no spellcheckers that would do anything useful with a sentence such as, ‘You shud try to rember all ways to youz a lifejacket when yotting.’ That my spellchecker corrects this perfectly (which it does) is less impressive now, I have to admit, than it would have been then, simply because there are now a few spellcheckers on the market which do make a reasonable attempt at errors of that kind. My spellchecker does, however, handle some classes of errors that other spellcheckers do not perform well on, and Chapter Eleven concludes the book with the results of some comparative tests, a few reflections on my spellchecker’s shortcomings and some speculations on possible developments

    A community project in Notting Dale

    Get PDF

    Professor Hintikka on Descartes’ “Cogito”

    Get PDF

    Spellcheckers

    Get PDF
    Techniques of computer spellchecking from the 1950's to the 2000's

    Practical research in distance teaching: a handbook for developing countries

    Get PDF
    Author’s preface, 2008. I wrote this book on my return from Lesotho, Africa, where I worked from 1974 to 1977 for the Lesotho Distance-Teaching Centre. We undertook many, mostly small, research projects to guide our distance teaching, and it was on this aspect of our work that I was asked to write. The book is about doing practical research. It is not a summary of research findings on distance teaching, nor is it a digest of the literature on educational research and evaluation. It is the advice that I would give to someone who wanted to know how to go about doing research in distance teaching, especially in a developing country. The first two chapters and the last are about linking research and action; the rest are about doing research. Chapters 3 to 11 describe methods common to social research in general but I use examples from distance teaching to illustrate them. The remaining chapters look at some of the tasks that research can perform in distance teaching, such as pretesting instructional materials or evaluating a campaign. Most of the advice in this book is still applicable today, but one topic is conspicuously absent - computer technology, which has changed beyond recognition since the book was published. The punched-card technology described briefly on pages 90 to 96 now belongs in a science museum. The book has very little to say about the use of computers for the analysis of research results, and nothing at all about the use of computers and the internet in the delivery of education
    • 

    corecore